Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner

نویسندگان

  • Mennatallah Amer
  • Markus Goldstein
چکیده

Unsupervised anomaly detection is the process of finding outlying records in a given dataset without prior need for training. In this paper we introduce an anomaly detection extension for RapidMiner in order to assist non-experts with applying eight different nearest-neighbor and clustering based algorithms on their data. A focus on efficient implementation and smart parallelization guarantees its practical applicability. In the context of clustering-based anomaly detection, two new algorithms are introduced: First, a global variant of the cluster-based local outlier factor (CBLOF) is introduced which tries to compensate the shortcomings of the original method. Second, the local density cluster-based outlier factor (LDCOF) is introduced which takes the local variances of clusters into account. The performance of all algorithms have been evaluated on real world datasets from the UCI machine learning repository. The results reveal the strengths and weaknesses of the single algorithms and show that our proposed clustering based algorithms outperform CBLOF significantly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anomaly detection in banking operations

This paper presents an overview of anomaly detection algorithms and methodology, focusing on the context of banking operations applications. The main principles of anomaly detection are first presented, followed by listing some of the areas in banking that can benefit from anomaly detection. We then discuss traditional nearest-neighbor and clustering-based approaches. Time series and other sequ...

متن کامل

FUZZY K-NEAREST NEIGHBOR METHOD TO CLASSIFY DATA IN A CLOSED AREA

Clustering of objects is an important area of research and application in variety of fields. In this paper we present a good technique for data clustering and application of this Technique for data clustering in a closed area. We compare this method with K-nearest neighbor and K-means.  

متن کامل

Histogram-based Outlier Score (HBOS): A fast Unsupervised Anomaly Detection Algorithm

Unsupervised anomaly detection is the process of nding outliers in data sets without prior training. In this paper, a histogrambased outlier detection (HBOS) algorithm is presented, which scores records in linear time. It assumes independence of the features making it much faster than multivariate approaches at the cost of less precision. A comparative evaluation on three UCI data sets and 10 s...

متن کامل

Edge Detection Based On Nearest Neighbor Linear Cellular Automata Rules and Fuzzy Rule Based System

 Edge Detection is an important task for sharpening the boundary of images to detect the region of interest. This paper applies a linear cellular automata rules and a Mamdani Fuzzy inference model for edge detection in both monochromatic and the RGB images. In the uniform cellular automata a transition matrix has been developed for edge detection. The Results have been compared to the ...

متن کامل

Edge Detection Based On Nearest Neighbor Linear Cellular Automata Rules and Fuzzy Rule Based System

 Edge Detection is an important task for sharpening the boundary of images to detect the region of interest. This paper applies a linear cellular automata rules and a Mamdani Fuzzy inference model for edge detection in both monochromatic and the RGB images. In the uniform cellular automata a transition matrix has been developed for edge detection. The Results have been compared to the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012